Skip to content

[fix](be) Fix json contains duplicate array candidates#63301

Open
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix-json-contains-duplicate-array-candidates
Open

[fix](be) Fix json contains duplicate array candidates#63301
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix-json-contains-duplicate-array-candidates

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 15, 2026

What problem does this PR solve?

Issue Number: DORIS-25573

Related PR: None

Problem Summary: json_contains('[1,1,1]', '[1,1]') should return true, but the previous JSONB array containment logic counted matching target elements and compared that count with the candidate array length. Duplicate candidate elements therefore produced incorrect false results. This change checks each candidate array element independently against the target array, matching non-consuming containment semantics.

Release note

Fix json_contains returning false when the candidate array contains duplicate elements already present in the target array.

Check List (For Author)

  • Test: Unit Test
    • ./run-be-ut.sh --run --filter=JsonbContainsTest.* -j 32
    • build-support/check-format.sh
    • git diff --check 66dbb85fe3deaf9069bc58f78446fe998ba5810b..HEAD
    • build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (failed because clang-tidy could not analyze existing headers in this environment: be/src/core/types.h has an unmatched NOLINTEND, and the toolchain could not resolve stddef.h/intrinsic headers)
  • Behavior changed: Yes
    • json_contains now follows non-consuming array containment semantics for duplicate candidate elements.
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: DORIS-25573

Related PR: None

Problem Summary: Fix json_contains array containment semantics so candidate elements are checked independently without consuming target array elements.

### Release note

Fix json_contains returning false when the candidate array contains duplicate elements already present in the target array.

### Check List (For Author)

- Test: Unit Test
    - ./run-be-ut.sh --run --filter=JsonbContainsTest.* -j 32
    - build-support/check-format.sh
    - build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (fails because clang-tidy cannot analyze existing headers in this environment: be/src/core/types.h has an unmatched NOLINTEND and stddef.h/intrinsic toolchain headers fail to resolve)
- Behavior changed: Yes
    - json_contains now follows non-consuming array containment semantics for duplicate candidate elements.
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 15, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review completed. I did not find blocking issues in this PR.

Critical checkpoint conclusions:

  • Goal and proof: The PR fixes JSONB array containment so duplicate candidate elements are checked independently rather than consuming/counting target elements. The implementation matches the intended non-consuming semantics and adds focused BE unit coverage for duplicates, unordered array candidates, negative cases, and recursive containment.
  • Scope/minimality: The change is small and localized to JsonbValue::contains array handling plus a focused unit test.
  • Concurrency/lifecycle: Not applicable. The modified path is read-only value traversal and introduces no shared mutable state, locks, threads, static initialization dependencies, or ownership lifecycle changes.
  • Configuration/compatibility/protocol: Not applicable. No config, storage format, persisted metadata, FE-BE protocol, or function signature changes are introduced.
  • Parallel code paths: The changed helper is used by the existing JSONB json_contains execution path; I did not find another changed path that needs the same fix.
  • Conditional checks/error handling: No new speculative error swallowing is introduced. Existing document validation and function error propagation remain unchanged.
  • Test coverage: The new BE unit test covers the regression and recursive behavior. I attempted ./run-be-ut.sh --run --filter=JsonbContainsTest.* -j 32, but this runner failed during BE UT configuration because thirdparty/installed/bin/protoc is missing, before running the test.
  • Observability: Not applicable for this pure comparison helper; no new logging/metrics needed.
  • Transactions/persistence/data writes: Not applicable.
  • Performance: Complexity remains consistent with the previous array containment approach and the code avoids new allocations; no blocking performance regression found.

User focus points: No additional user-provided review focus was present.

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 15, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31150 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2b1fa2a5499f292f26bf7ccd5ab7ce3a36bf7750, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17624	3859	3875	3859
q2	q3	10799	1393	803	803
q4	4686	469	357	357
q5	7708	2254	2140	2140
q6	241	178	143	143
q7	972	766	629	629
q8	9338	1613	1629	1613
q9	5553	4938	4897	4897
q10	6468	2063	1764	1764
q11	444	288	251	251
q12	695	431	300	300
q13	18169	3579	2828	2828
q14	266	257	233	233
q15	q16	819	768	699	699
q17	1010	1024	882	882
q18	6803	5627	5616	5616
q19	1194	1363	1115	1115
q20	504	404	257	257
q21	5621	2565	2455	2455
q22	421	370	309	309
Total cold run time: 99335 ms
Total hot run time: 31150 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4179	4085	4091	4085
q2	q3	4483	4923	4291	4291
q4	2104	2221	1414	1414
q5	4361	4257	4264	4257
q6	225	172	128	128
q7	1861	2084	1745	1745
q8	2536	2166	2170	2166
q9	7926	8026	7672	7672
q10	4537	4479	4102	4102
q11	565	412	389	389
q12	762	754	538	538
q13	3358	3581	2939	2939
q14	303	289	280	280
q15	q16	701	731	626	626
q17	1337	1329	1421	1329
q18	7705	7253	7157	7157
q19	1181	1172	1108	1108
q20	2196	2193	1902	1902
q21	5308	4651	4511	4511
q22	514	478	406	406
Total cold run time: 56142 ms
Total hot run time: 51045 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 168753 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2b1fa2a5499f292f26bf7ccd5ab7ce3a36bf7750, data reload: false

query5	4338	652	515	515
query6	344	223	201	201
query7	4260	576	316	316
query8	330	235	215	215
query9	8821	4006	4033	4006
query10	445	343	293	293
query11	5818	2378	2233	2233
query12	178	133	124	124
query13	1283	611	432	432
query14	5949	5430	5091	5091
query14_1	4388	4386	4379	4379
query15	214	209	183	183
query16	1015	508	462	462
query17	1183	771	646	646
query18	2695	504	376	376
query19	226	217	175	175
query20	141	134	134	134
query21	224	140	118	118
query22	13560	13575	13337	13337
query23	17138	16409	16045	16045
query23_1	16137	16100	16228	16100
query24	7383	1773	1287	1287
query24_1	1297	1295	1311	1295
query25	533	483	415	415
query26	1322	315	170	170
query27	2657	556	338	338
query28	4362	1938	1912	1912
query29	945	623	523	523
query30	308	239	200	200
query31	1104	1068	945	945
query32	85	75	72	72
query33	520	354	295	295
query34	1146	1167	619	619
query35	742	767	661	661
query36	1320	1348	1157	1157
query37	150	104	95	95
query38	3193	3123	3083	3083
query39	926	934	902	902
query39_1	901	876	877	876
query40	231	143	126	126
query41	66	64	64	64
query42	109	107	112	107
query43	329	333	279	279
query44	
query45	212	200	191	191
query46	1049	1218	735	735
query47	2361	2347	2206	2206
query48	399	393	289	289
query49	637	490	382	382
query50	992	338	242	242
query51	4296	4332	4225	4225
query52	103	107	95	95
query53	255	279	209	209
query54	331	279	252	252
query55	90	90	82	82
query56	311	319	307	307
query57	1384	1390	1330	1330
query58	292	268	263	263
query59	1530	1603	1430	1430
query60	315	326	310	310
query61	157	154	151	151
query62	667	616	552	552
query63	253	205	204	204
query64	2344	839	696	696
query65	
query66	1699	488	370	370
query67	30032	29341	29834	29341
query68	
query69	483	335	316	316
query70	1008	1027	955	955
query71	304	280	277	277
query72	3191	2893	2476	2476
query73	804	718	409	409
query74	5069	4898	4759	4759
query75	2654	2599	2274	2274
query76	2263	1131	748	748
query77	402	405	331	331
query78	12244	12196	11650	11650
query79	1449	1046	768	768
query80	1057	549	446	446
query81	512	283	243	243
query82	1383	158	121	121
query83	341	277	244	244
query84	262	140	105	105
query85	948	540	458	458
query86	437	366	334	334
query87	3421	3408	3189	3189
query88	3508	2658	2639	2639
query89	446	392	339	339
query90	1784	181	177	177
query91	181	168	148	148
query92	81	78	75	75
query93	1457	1492	881	881
query94	626	367	318	318
query95	695	479	340	340
query96	1006	773	353	353
query97	2704	2680	2542	2542
query98	236	225	229	225
query99	1124	1089	973	973
Total cold run time: 253147 ms
Total hot run time: 168753 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (21/21) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.62% (27819/37789)
Line Coverage 57.57% (301510/523694)
Region Coverage 54.82% (252087/459818)
Branch Coverage 56.34% (108929/193344)

@mrhhsg mrhhsg marked this pull request as ready for review May 16, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants